Manual flow cytometry analysis with Python¶

Table of contents¶

  1. Why analyse flow cytometry data in Python?
  2. Working with single .fcs files: the sample class
    1. Reading and exploring an .fcs file
    2. Transforming parameter values
  3. Manual gating
    1. Rectangle gates
    2. Polygon gates
    3. Ellipsoid gates
    4. Quadrant gates
  4. Extracting population statistics
  5. Exporting gated populations as .fcs files

Why analyse flow cytometry data in Python?¶

With easy to use, graphical tools like FlowJo, FCSExpress and Cytobank, you might wonder why someone would want to manually analyse their flow cytometry data by writing Python code. It’s true that for most basic cytometry experiments it will be easier and faster to use one of these platforms, but there are benefits to using Python:

  • It’s free and can be run on any computer without a license!
  • You can create re-useable analysis scripts that allow others to reproduce your analyses
  • Analysis scripts can be automatically run periodically to analyse new screening data in a folder
  • Access to a broader range of computational, statistical and plotting tools
  • Managing plugins (packages) is usually easier!
  • It makes you look cool ;)

In this tutorial we're going to use the flowkit package as it's probably the most mature and fully featured, but others exist such as CytoPy and pytometry. The flowkit package uses the bokeh package for its interactive graphics, so we import the show method from its plotting module. So that these interactive plots render in this notebook, we also need to call bokeh.io.output_notebook(). Don't forget that before using these packages you will need to install them, for example using pip install bokeh flowkit.

In [ ]:
import bokeh
from bokeh.plotting import show
import flowkit as fk
import numpy as np
import pandas as pd
bokeh.io.output_notebook()
Loading BokehJS ...

Working with single .fcs files: the sample class¶

We’ve already seen data structures such as lists, dictionaries, NumPy arrays, and pandas DataFrames. The flowkit package provides a new data structure to contain data from an .fcs file: the sample.

Reading and exploring an .fcs file¶

To read an .fcs file into Python, we use Sample() function from flowkit. If we call the sample object we just created, we get a print out just telling us the version of the .fcs file (version 3.0 in this case), the filename. the number of parameters (or "channels" but this technically has a different meaning in flow and we should avoid calling parameters "channels". Anyway, there are 10 here), and the number of events.

In [ ]:
sample = fk.Sample('data/raw/OpA_I2_C2_IM1_WB1_R1.fcs')
sample
Out[ ]:
Sample(v3.1, OpA_I2_C2_IM1_WB1_R1.fcs, 10 channels, 400000 events)

We can get some information abaout the parameters in the sample by looking at its channels attribute. This is displayed as a DataFrame showing us the short parameter name (pnn), the optional parameter label (pns), the amplifier gain (png; 1.0 = no gain), the lin/log amplification type (pne; required but rarely used now), and the upper value or range (pnr).

In [ ]:
sample.channels
Out[ ]:
channel_number pnn pns png pne pnr
0 1 FSC-A 1.0 (0.0, 0.0) 262144.0
1 2 FSC-H 1.0 (0.0, 0.0) 262144.0
2 3 SSC-A 1.0 (0.0, 0.0) 262144.0
3 4 FITC-A CD4 1.0 (0.0, 0.0) 262144.0
4 5 PE-A CCR7 1.0 (0.0, 0.0) 262144.0
5 6 PerCP-Cy5-5-A CD8 1.0 (0.0, 0.0) 262144.0
6 7 PE-Cy7-A CD3 1.0 (0.0, 0.0) 262144.0
7 8 APC-A CD45RA 1.0 (0.0, 0.0) 262144.0
8 9 APC-Cy7-A CD45 1.0 (0.0, 0.0) 262144.0
9 10 Time 1.0 (0.0, 0.0) 262144.0

We can get lists of the PnN and PnS labels by getting the pnn_labels and pns_labels attributes, respectively.

In [ ]:
sample.pnn_labels
Out[ ]:
['FSC-A',
 'FSC-H',
 'SSC-A',
 'FITC-A',
 'PE-A',
 'PerCP-Cy5-5-A',
 'PE-Cy7-A',
 'APC-A',
 'APC-Cy7-A',
 'Time']
In [ ]:
sample.pns_labels
Out[ ]:
['', '', '', 'CD4', 'CCR7', 'CD8', 'CD3', 'CD45RA', 'CD45', '']

We can extract the raw, event-level data either as a NumPy array or pandas DataFrame using the get_events() and as_dataframe() methods, respectively. The argument source allows us to specify the kind of data we want and can take the following values:

  • 'raw', returns uncompensated, untransforme data
  • 'comp', returns compensated, untransformed data
  • 'xform', returns transformed data (compensated if compensation was performed)

We will only have access to compensated and/or transformed values once we have performed these steps later.

In [ ]:
sample.get_events(source = 'raw')
Out[ ]:
array([[1.51025281e+05, 1.16233000e+05, 1.92647828e+05, ...,
        7.40429993e+02, 6.16455029e+03, 1.00000005e-03],
       [1.16219039e+05, 9.49560000e+04, 6.08364609e+04, ...,
        2.73942017e+03, 3.36955508e+04, 2.00000009e-03],
       [1.45838562e+05, 1.11304000e+05, 1.97575422e+05, ...,
        2.71890015e+02, 2.48805005e+03, 3.00000003e-03],
       ...,
       [1.48042719e+05, 1.07230000e+05, 1.53985562e+05, ...,
        5.19840027e+02, 1.42956006e+03, 2.00011002e+02],
       [1.40678719e+05, 1.03240000e+05, 2.42794891e+05, ...,
        9.09720032e+02, 1.33209004e+04, 2.00011002e+02],
       [2.01164328e+05, 1.45805000e+05, 2.46083188e+05, ...,
        4.25790009e+02, 3.51405005e+03, 2.00011002e+02]])
In [ ]:
sample.as_dataframe(source = 'raw')
Out[ ]:
pnn FSC-A FSC-H SSC-A FITC-A PE-A PerCP-Cy5-5-A PE-Cy7-A APC-A APC-Cy7-A Time
pns CD4 CCR7 CD8 CD3 CD45RA CD45
0 151025.281250 116233.0 192647.828125 351.140015 873.000000 172.660004 964.180054 740.429993 6164.550293 0.001000
1 116219.039062 94956.0 60836.460938 8130.540039 2537.520020 535.440002 28739.160156 2739.420166 33695.550781 0.002000
2 145838.562500 111304.0 197575.421875 353.080017 1519.020020 89.240005 863.300049 271.890015 2488.050049 0.003000
3 145582.078125 111470.0 203096.671875 492.760010 5422.300293 356.960022 2227.120117 579.690002 2491.469971 0.003000
4 85385.437500 69691.0 46193.339844 5377.680176 3166.080078 816.740051 43981.742188 2547.900146 25853.490234 0.005000
... ... ... ... ... ... ... ... ... ... ...
399995 140499.515625 112582.0 221049.421875 100.880005 1140.720093 194.000000 1994.320068 478.800018 8818.469727 200.009003
399996 158901.125000 121242.0 128474.562500 5032.360352 1689.739990 31.040001 1313.380005 1451.790039 14246.010742 200.009995
399997 148042.718750 107230.0 153985.562500 413.220001 995.220032 182.360001 234.740005 519.840027 1429.560059 200.011002
399998 140678.718750 103240.0 242794.890625 164.900009 564.540039 190.120010 830.320007 909.720032 13320.900391 200.011002
399999 201164.328125 145805.0 246083.187500 593.640015 735.260010 161.020004 172.660004 425.790009 3514.050049 200.011002

400000 rows × 10 columns

To extract a dictionary of FCS keywords, we use the get_metadata() method, further subsetting this using the key of the one we want. Below we can see this data was acquired on a FACSCanto instrument.

In [ ]:
sample.get_metadata()
Out[ ]:
{'beginanalysis': '0',
 'begindata': '2698',
 'beginstext': '0',
 'byteord': '1,2,3,4',
 'datatype': 'F',
 'endanalysis': '0',
 'enddata': '16002697',
 'endstext': '0',
 'mode': 'L',
 'nextdata': '0',
 'par': '10',
 'tot': '400000',
 'p1b': '32',
 'p1e': '0,0',
 'p1g': '1.0',
 'p1r': '262144',
 'p1n': 'FSC-A',
 'p2b': '32',
 'p2e': '0,0',
 'p2g': '1.0',
 'p2r': '262144',
 'p2n': 'FSC-H',
 'p3b': '32',
 'p3e': '0,0',
 'p3g': '1.0',
 'p3r': '262144',
 'p3n': 'SSC-A',
 'p4b': '32',
 'p4e': '0,0',
 'p4g': '1.0',
 'p4r': '262144',
 'p4n': 'FITC-A',
 'p4s': 'CD4',
 'p5b': '32',
 'p5e': '0,0',
 'p5g': '1.0',
 'p5r': '262144',
 'p5n': 'PE-A',
 'p5s': 'CCR7',
 'p6b': '32',
 'p6e': '0,0',
 'p6g': '1.0',
 'p6r': '262144',
 'p6n': 'PerCP-Cy5-5-A',
 'p6s': 'CD8',
 'p7b': '32',
 'p7e': '0,0',
 'p7g': '1.0',
 'p7r': '262144',
 'p7n': 'PE-Cy7-A',
 'p7s': 'CD3',
 'p8b': '32',
 'p8e': '0,0',
 'p8g': '1.0',
 'p8r': '262144',
 'p8n': 'APC-A',
 'p8s': 'CD45RA',
 'p9b': '32',
 'p9e': '0,0',
 'p9g': '1.0',
 'p9r': '262144',
 'p9n': 'APC-Cy7-A',
 'p9s': 'CD45',
 'p10b': '32',
 'p10e': '0,0',
 'p10g': '1.0',
 'p10r': '262144',
 'p10n': 'Time',
 'fil': 'OpA_I2_C2_IM1_WB1_R1.fcs',
 'sys': 'Windows XP 5.1',
 'src': '           ',
 'date': '19-APR-2018',
 'btim': '13:20:39',
 'etim': '13:24:02',
 'cyt': 'FACSCanto',
 'op': '             ',
 'inst': ' ',
 'p1v': '97',
 'p2v': '97',
 'p3v': '423',
 'p4v': '483',
 'p5v': '620',
 'p6v': '544',
 'p7v': '754',
 'p8v': '582',
 'p9v': '706',
 'creator': 'BD FACSDiva Software Version 6.1.3',
 'tube name': '         ',
 'experiment name': '                       ',
 'guid': '2ab5d204-673d-4b77-a251-591efc88030c',
 'cytnum': '1',
 'window extension': '7.00',
 'export user name': '             ',
 'export time': '14-MAY-2018-13:48:28',
 'fsc asf': '1.12',
 'autobs': 'TRUE',
 'laser1name': 'Blue',
 'laser1delay': '0.00',
 'laser1asf': '1.94',
 'laser2name': 'Red',
 'laser2delay': '28.48',
 'laser2asf': '1.71',
 'spill': '6,FITC-A,PE-A,PerCP-Cy5-5-A,PE-Cy7-A,APC-A,APC-Cy7-A,1,0.20000000316200833,0.010000259790764056,0,0,0,0.004999996329852424,1,0.037000002693247325,0.0040000050732601836,0,0,0.0020290974735818383,0.03500123911686522,1,1.3499999934932951,0.02000015592539009,0.09999999876097611,0.0005655546867790353,0.012001239604607583,0.013000054959324544,1,0.00042784745930640527,0.023518117238373906,0,0,0.0032115590263317095,0,1,0.06999999330644507,0,0.0005826769847731325,0.002967216706987719,0.050000015483425794,0.06500000567621908,1',
 'apply compensation': 'TRUE',
 'threshold': 'FSC,20000',
 'p1display': 'LIN',
 'p1bs': '0',
 'p1ms': '0',
 'p2display': 'LIN',
 'p2bs': '0',
 'p2ms': '0',
 'p3display': 'LIN',
 'p3bs': '0',
 'p3ms': '0',
 'p4display': 'LOG',
 'p4bs': '68',
 'p4ms': '0',
 'p5display': 'LOG',
 'p5bs': '144',
 'p5ms': '0',
 'p6display': 'LOG',
 'p6bs': '185',
 'p6ms': '0',
 'p7display': 'LOG',
 'p7bs': '1230',
 'p7ms': '0',
 'p8display': 'LOG',
 'p8bs': '399',
 'p8ms': '0',
 'p9display': 'LOG',
 'p9bs': '658',
 'p9ms': '0',
 'p10bs': '0',
 'p10ms': '0',
 'sample id': ' ',
 'patient id': ' ',
 'case number': ' ',
 'cst setup status': 'SUCCESS',
 'cst beads lot id': '41720',
 'cytometer config name': '                ',
 'cytometer config create date': '11/03/2016 01:04:40 PM',
 'cst setup date': '12/06/2016 02:33:54 PM',
 'cst baseline date': '11/03/2016 01:20:27 PM'}
In [ ]:
sample.get_metadata()['cyt']
Out[ ]:
'FACSCanto'

Compensating a sample¶

If your panel of fluorophores exhibit spectral overlap with each other, it’s important to compensate your data using single colour controls. In this section we’re going to compensate our data using the compensation matrix calculated by the instrument during acquisition. This compensation matrix is saved as a keyword in each .fcs file from the experiment. We could also import a compensation matrix from FlowJo.

For this .fcs file, the spillover matrix is stored in the 'spill' keyword, but different instrument vendors may use 'spillover' or '$SPILLOVER'. Once we extract the spillover matrix (as a string actually), we use the apply_compensation() method to compensate the data. The sample now contains both raw and comp data and we can access either. We can also modify the compensation matric if we wish, but we won't do that in this tutorial.

In [ ]:
comp_mat = sample.metadata['spill']
sample.apply_compensation(comp_mat)
sample.compensation.as_dataframe()
Out[ ]:
FITC-A PE-A PerCP-Cy5-5-A PE-Cy7-A APC-A APC-Cy7-A
FITC-A 1.000000 0.200000 0.010000 0.000 0.000000 0.000000
PE-A 0.005000 1.000000 0.037000 0.004 0.000000 0.000000
PerCP-Cy5-5-A 0.002029 0.035001 1.000000 1.350 0.020000 0.100000
PE-Cy7-A 0.000566 0.012001 0.013000 1.000 0.000428 0.023518
APC-A 0.000000 0.000000 0.003212 0.000 1.000000 0.070000
APC-Cy7-A 0.000000 0.000583 0.002967 0.050 0.065000 1.000000

Transforming parameter values¶

Most modern flow cytometers export linear data for all parameters. We usually need to apply some transformation to the fluorescence parameters to visualise the distribution of the data across orders of magnitude.

There are different transformations we can apply. For mass cytometry data the asinh transform is commonly used. For fluorescence cytometry data, the logicle transformation is a logical choice ;). flowkit has functions for applying asinh and logicle transformations, but here we're going to apply the biexponential transformation, with the same parameters as those found in FlowJo. We first define the transformation using fk.transforms.WSPBiexTransform(), giving our transformation a name and setting its parameters. I've stuck with the default arguments here, but width and negative are the main ones you might wish to play with (width mainly).

We then use the apply_transform() method to transform our fluorescence values. This method automatically identifies fluorescent parameters, though we could supply a dictionary of parameter:transform pairs to a) specify which parameters we wanted to transform, and b) apply different transforms per parameter.

In [ ]:
biex_xform = fk.transforms.WSPBiexTransform(
    'biex',
    max_value = 262144,
    positive  = 4.42,
    width     = -10,
    negative  = 0
)

sample.apply_transform(biex_xform)

Let's remind ourselves of the parameters, the labels and numbers.

In [ ]:
sample.channels
Out[ ]:
channel_number pnn pns png pne pnr
0 1 FSC-A 1.0 (0.0, 0.0) 262144.0
1 2 FSC-H 1.0 (0.0, 0.0) 262144.0
2 3 SSC-A 1.0 (0.0, 0.0) 262144.0
3 4 FITC-A CD4 1.0 (0.0, 0.0) 262144.0
4 5 PE-A CCR7 1.0 (0.0, 0.0) 262144.0
5 6 PerCP-Cy5-5-A CD8 1.0 (0.0, 0.0) 262144.0
6 7 PE-Cy7-A CD3 1.0 (0.0, 0.0) 262144.0
7 8 APC-A CD45RA 1.0 (0.0, 0.0) 262144.0
8 9 APC-Cy7-A CD45 1.0 (0.0, 0.0) 262144.0
9 10 Time 1.0 (0.0, 0.0) 262144.0

We can plot two parameters against each other to visualize the effectiveness of the compensation and transformation using the plot_scatter() method (slightly confusingly this refers to a scatter plot, not a plot of the light-scatter parameters), we can either provide the "channel_number" or the "pnn" from the table above to set the x and y axes. Setting source = 'xform' ensures we are visualizing the compensated, transformed data. By default, a subsample of the events (determined when the sample is loaded) is plotted, but you could plot everything by setting subsample = False. Once we've created the plot, we need to visualise it by calling show(). Play around with the width argument of the fk.transforms.WSPBiexTransform() above and see how this impacts the plotted data.

In [ ]:
p = sample.plot_scatter(4, 6, source = 'xform', subsample = True)
show(p)
In [ ]:
p = sample.plot_scatter('FITC-A', 'PerCP-Cy5-5-A', source = 'xform', subsample = True)
show(p)

It's a good idea to visualise all the parameters to ensure the transformation we applied is suitable for all (and to identify any potential compensation errors). One quick way to do this is using the plot_scatter_matrix() method, which will plot all combinations of parameters we provide as a list.

We can use the export_png() function from bokeh's io module to save the image as a file. Note that if you get errors about needing to install "selenium" or a different browser, you may have to run the following in your terminal/powershell: pip install selenium and conda install -c conda-forge firefox geckodriver (these are only needed for saving the plots as images).

In [ ]:
p = sample.plot_scatter_matrix([4, 5, 6, 7, 8, 9])
show(p)
bokeh.io.export_png(p, filename = 'plots/scatter matrix.png')
Out[ ]:
'c:\\Users\\u061745\\OneDrive - UCB\\Documents\\Projects\\Python-for-cytometry-course\\plots\\scatter matrix.png'

The method above quickly becomes unsuitable with even a modest number of parameters, so below we do this one by one against a fixed y axis of our choice.

In [ ]:
for param in [4, 5, 6, 7, 8]:
    p = sample.plot_scatter(param, 9, source = 'xform', subsample = True)
    
    show(p)
    
    bokeh.io.export_png(
        p, 
        filename = 'plots/' + sample.pnn_labels[param - 1] + '.png'
    )

Looks like we have some events buried below the axis on some of our parameters. We'll continue for now but I'll leave it as an exercise for you to play around with the transformation.

The flowkit package uses bokeh for its plotting, but there is nothing stopping you from extracting the data in DataFrame format and using seaborn like we did in the previous tutorial.

Manual gating¶

Just because we’re typing, doesn’t mean we can’t manually gate our data like we do in FlowJo. We can create rectangle, polygon, ellipsoid, and quadrant gates in Python just like in any other analysis software. To do this, we’re going to create a Session object which will hold the information about our gates and samples.

Once we've instantiated a session with Session() we can add all of the files in a directory to it using add-Samples() and providing the path to the files (we could also give a list of specific filepaths if we wanted). We can compensate and transform all the data using add_comp_matrix() and add_transform(), respectively, applying the compensation matrix and transformation we defined earlier. Printing the session object just tells us we have 9 samples, and we can get their ids with get_sample_ids().

In [ ]:
session = fk.Session()
session.add_samples('data/raw/')
session.add_comp_matrix(sample.compensation)
session.add_transform(biex_xform)
print(session)
session.get_sample_ids()
Session(9 samples)
Out[ ]:
['OpA_I2_C2_IM1_WB1_R1.fcs',
 'OpA_I2_C2_IM1_WB1_R2.fcs',
 'OpA_I2_C2_IM1_WB1_R3.fcs',
 'OpA_I2_C2_IM1_WB2_R1.fcs',
 'OpA_I2_C2_IM1_WB2_R2.fcs',
 'OpA_I2_C2_IM1_WB2_R3.fcs',
 'OpA_I2_C2_IM1_WB3_R1.fcs',
 'OpA_I2_C2_IM1_WB3_R2.fcs',
 'OpA_I2_C2_IM1_WB3_R3.fcs']

Rectangle gates¶

We'll start by gating on that lymphocyte population by FSC-A and SSC-A. I wouldn't normally use a rectangle gate here but it's a good place to start. Let's plot the scatter parameters so we can get an idea of where we want our gate to go.

In [ ]:
p = sample.plot_scatter('FSC-A', 'SSC-A')
show(p)

To create our rectangle gate, we need to define a Dimension object for each parameter of our gate (i.e. 2). We do this with the Dimension() function, specifying the parameter and it's minimum and maximum values (if range_min or range_max are not given then minimum or maximum is unbounded, respectively.) We then create a RectangleGate object using the function of the same name, giving the name of the gate and a list of our Dimension objects.

We then use the add_gate() method, giving it the RectangleGate object and the name of the parent population (we use 'root', to indicate this gate is at the top of the hierarchy). To actually gate the data and assign each event to be in or our of the gate, we use the analyze_samples() method, and finally use get_analysis_report() to get a nice tabular summary of the results.

In [ ]:
fsc_a = fk.Dimension('FSC-A', range_min = 5e4, range_max = 1.2e5)
ssc_a = fk.Dimension('SSC-A', range_min = 1e4, range_max = 8e4)

scatter_gate = fk.gates.RectangleGate(
    'scatter', 
    dimensions = [fsc_a, ssc_a]
)

session.add_gate(scatter_gate, gate_path = ('root',))

session.analyze_samples(verbose = True)
session.get_analysis_report()
#### Processing gates for 9 samples (multiprocessing is enabled - 9 cpus) ####
Out[ ]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 OpA_I2_C2_IM1_WB1_R1.fcs (root,) scatter RectangleGate None root 127122 31.78050 31.78050 1
1 OpA_I2_C2_IM1_WB1_R2.fcs (root,) scatter RectangleGate None root 132175 33.04375 33.04375 1
2 OpA_I2_C2_IM1_WB1_R3.fcs (root,) scatter RectangleGate None root 133872 33.46800 33.46800 1
3 OpA_I2_C2_IM1_WB2_R1.fcs (root,) scatter RectangleGate None root 116529 29.13225 29.13225 1
4 OpA_I2_C2_IM1_WB2_R2.fcs (root,) scatter RectangleGate None root 118876 29.71900 29.71900 1
5 OpA_I2_C2_IM1_WB2_R3.fcs (root,) scatter RectangleGate None root 119713 29.92825 29.92825 1
6 OpA_I2_C2_IM1_WB3_R1.fcs (root,) scatter RectangleGate None root 176388 44.09700 44.09700 1
7 OpA_I2_C2_IM1_WB3_R2.fcs (root,) scatter RectangleGate None root 173472 43.36800 43.36800 1
8 OpA_I2_C2_IM1_WB3_R3.fcs (root,) scatter RectangleGate None root 170989 42.74725 42.74725 1

It's a good idea to visualise our gate to make sure we're happy with its placement. Here, we isolate just a single test case to plot.

In [ ]:
case = session.get_sample_ids()[0]
p = session.plot_gate(case, 'scatter')
show(p)

That looks like it captures the events we want it to. If we did want to make a change, we'll get an error if we try to define a gate that already exists, and we must first remove it with g_strat.remove_gate(<name of population>).

Polygon gates¶

Let’s use a polygon gate to exclude doublets. First, let's plot FSC-A against FSC-H using the session class's plot_scatter() method. To this we give the name of the sample we wish to plot, the x and x parameters in the form of Dimension objects (this is a bit annoying but it is what it is), and the name of the population we wish to plot (we could omit this argument to plot everything, give it a go).

In [ ]:
fsc_a = fk.Dimension('FSC-A')
fsc_h = fk.Dimension('FSC-H')

p = session.plot_scatter(
    case,
    x_dim     = fsc_a,
    y_dim     = fsc_h,
    gate_name = 'scatter'
)

show(p)

To create a polygon gate in Python, you need to create a list whose elements are lists of x y coordinate pairs for each of the vertices of your polygon. In the example below, we're creating a polygon gate with four vertices where the first and second value in each pair are the FSC-A and FSC-H values, respectively. I then define a Dimension object for FSC-H (we'll reuse the FSC-A one from earlier), then use the PolygonGate() function to create the gate. We then add this gate to the GatingStrategy as before and generate the report.

In [ ]:
vertices = [
    [50000,  38000],
    [100000, 80000],
    [100000, 85000],
    [50000,  45000]
]

singlet_gate = fk.gates.PolygonGate(
    'singlets',
    dimensions = [fsc_a, fsc_h],
    vertices   = vertices
)

session.add_gate(singlet_gate, gate_path = ('root', 'scatter'))
session.analyze_samples(verbose = True)
session.get_analysis_report()
#### Processing gates for 9 samples (multiprocessing is enabled - 9 cpus) ####
Out[ ]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 OpA_I2_C2_IM1_WB1_R1.fcs (root,) scatter RectangleGate None root 127122 31.78050 31.780500 1
1 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter) singlets PolygonGate None scatter 115728 28.93200 91.036957 2
2 OpA_I2_C2_IM1_WB1_R2.fcs (root,) scatter RectangleGate None root 132175 33.04375 33.043750 1
3 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter) singlets PolygonGate None scatter 120130 30.03250 90.887082 2
4 OpA_I2_C2_IM1_WB1_R3.fcs (root,) scatter RectangleGate None root 133872 33.46800 33.468000 1
5 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter) singlets PolygonGate None scatter 120928 30.23200 90.331063 2
6 OpA_I2_C2_IM1_WB2_R1.fcs (root,) scatter RectangleGate None root 116529 29.13225 29.132250 1
7 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter) singlets PolygonGate None scatter 103375 25.84375 88.711823 2
8 OpA_I2_C2_IM1_WB2_R2.fcs (root,) scatter RectangleGate None root 118876 29.71900 29.719000 1
9 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter) singlets PolygonGate None scatter 106191 26.54775 89.329217 2
10 OpA_I2_C2_IM1_WB2_R3.fcs (root,) scatter RectangleGate None root 119713 29.92825 29.928250 1
11 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter) singlets PolygonGate None scatter 105490 26.37250 88.119085 2
12 OpA_I2_C2_IM1_WB3_R1.fcs (root,) scatter RectangleGate None root 176388 44.09700 44.097000 1
13 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter) singlets PolygonGate None scatter 158237 39.55925 89.709617 2
14 OpA_I2_C2_IM1_WB3_R2.fcs (root,) scatter RectangleGate None root 173472 43.36800 43.368000 1
15 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter) singlets PolygonGate None scatter 153580 38.39500 88.533020 2
16 OpA_I2_C2_IM1_WB3_R3.fcs (root,) scatter RectangleGate None root 170989 42.74725 42.747250 1
17 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter) singlets PolygonGate None scatter 140998 35.24950 82.460275 2
In [ ]:
p = session.plot_gate(case, 'singlets')
show(p)

At this point, if we wanted to vsiaulise the hierarchy we're creating, we can do so with the get_gate_hierarchy() method.

In [ ]:
print(session.get_gate_hierarchy())
root
╰── scatter
    ╰── singlets

Ellipsoid gates¶

Next, let’s gate on our lymphocyte population using an ellipsoid gate around the CD45+ CD3+ events. Again, let’s plot the parameters and highlight the population we’re interested in to help us draw the gate.

In [ ]:
cd45 = fk.Dimension('APC-Cy7-A')
cd3 = fk.Dimension('PE-Cy7-A')

p = session.plot_scatter(
    case,
    x_dim     = cd45,
    y_dim     = cd3,
    gate_name = 'singlets'
)

show(p)

Hmm, that doesn't look right, what's happened? Well as we're now working with fluorescence parameters instead of scatter parameters, we need to specifically ask for compensated and transformed data, and we do this inside the definition of each Dimension object. This might seem clunky, but it's because a Session object can contain multiple compensation matrices and transformations, especially if loaded in from a FlowJo workspace file. We can get the name of the compensation matric using the get_comp_matrices() method (noting there is only one here called 'custom_spill'), and we know the name of the biexponential transformation we defined earlier ('biex').

In [ ]:
session.get_comp_matrices()
Out[ ]:
[Matrix(custom_spill, dims: 6)]

We now modify the above code to get the compensated, transformed data.

In [ ]:
cd45 = fk.Dimension(
    'APC-Cy7-A', 
    compensation_ref = 'custom_spill', 
    transformation_ref = 'biex'
)

cd3 = fk.Dimension(
    'PE-Cy7-A', 
    compensation_ref = 'custom_spill', 
    transformation_ref = 'biex'
)

p = session.plot_scatter(
    case,
    x_dim     = cd45,
    y_dim     = cd3,
    gate_name = 'singlets'
)

show(p)

Creating ellipsoid gates can also be a little cumbersome in Python. Because an ellipse can be diagonal, we need to define a covariance matrix and a list of means, in order to create an ellipsoid gate.

NOTE: Covariance is just unstandardized correlation. If you want to create an ellipse that has a correlation between the two parameters (i.e. it is tilted or diagonal), you need to add some covariance. If you want to create an ellipse that shows no correlation between two parameters, leave the covariances 0 and just set the variance for each parameter individually (this sets the width of the ellipse in each axis).

In this example, we want an ellipse to encircle the CD45+ CD3+ events. As this population is not diagonal, we can set the covariance values in our matrix to 0, and just set the variances to control how wide the ellipse is around its mean. In the example below, I represent the covariance matrix as a NumPy array where the diagonal elements are the variances and the off-diagonals are the covariance.

We then use the EllipsoidGate() method to create the gate, giving the name of the population, the dimensions (remember that these Dimension definitions must include the relevant compensation and transformations), the covariance matrix, and the distance_square() argument, which is just a multiplier for the variance and can be used to shrink or expand the gate.

In [ ]:
center = [3150, 3500]
cov    = np.array(
    [[300 ** 2, 0], 
     [0, 300 ** 2]]
)

print(cov)

cd3_gate = fk.gates.EllipsoidGate(
    'cd3',
    dimensions        = [cd45, cd3],
    coordinates       = center,
    covariance_matrix = cov,
    distance_square   = 1
)

session.add_gate(cd3_gate, gate_path = ('root', 'scatter', 'singlets'))
session.analyze_samples()
session.get_analysis_report()
[[90000     0]
 [    0 90000]]
Out[ ]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 OpA_I2_C2_IM1_WB1_R1.fcs (root,) scatter RectangleGate None root 127122 31.78050 31.780500 1
1 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter) singlets PolygonGate None scatter 115728 28.93200 91.036957 2
2 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 61324 15.33100 52.989769 3
3 OpA_I2_C2_IM1_WB1_R2.fcs (root,) scatter RectangleGate None root 132175 33.04375 33.043750 1
4 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter) singlets PolygonGate None scatter 120130 30.03250 90.887082 2
5 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 62621 15.65525 52.127695 3
6 OpA_I2_C2_IM1_WB1_R3.fcs (root,) scatter RectangleGate None root 133872 33.46800 33.468000 1
7 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter) singlets PolygonGate None scatter 120928 30.23200 90.331063 2
8 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 62844 15.71100 51.968113 3
9 OpA_I2_C2_IM1_WB2_R1.fcs (root,) scatter RectangleGate None root 116529 29.13225 29.132250 1
10 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter) singlets PolygonGate None scatter 103375 25.84375 88.711823 2
11 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 58972 14.74300 57.046675 3
12 OpA_I2_C2_IM1_WB2_R2.fcs (root,) scatter RectangleGate None root 118876 29.71900 29.719000 1
13 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter) singlets PolygonGate None scatter 106191 26.54775 89.329217 2
14 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 60170 15.04250 56.662052 3
15 OpA_I2_C2_IM1_WB2_R3.fcs (root,) scatter RectangleGate None root 119713 29.92825 29.928250 1
16 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter) singlets PolygonGate None scatter 105490 26.37250 88.119085 2
17 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 60300 15.07500 57.161816 3
18 OpA_I2_C2_IM1_WB3_R1.fcs (root,) scatter RectangleGate None root 176388 44.09700 44.097000 1
19 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter) singlets PolygonGate None scatter 158237 39.55925 89.709617 2
20 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 73853 18.46325 46.672396 3
21 OpA_I2_C2_IM1_WB3_R2.fcs (root,) scatter RectangleGate None root 173472 43.36800 43.368000 1
22 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter) singlets PolygonGate None scatter 153580 38.39500 88.533020 2
23 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 70976 17.74400 46.214351 3
24 OpA_I2_C2_IM1_WB3_R3.fcs (root,) scatter RectangleGate None root 170989 42.74725 42.747250 1
25 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter) singlets PolygonGate None scatter 140998 35.24950 82.460275 2
26 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 39169 9.79225 27.779827 3

From this point, it may be easier to view the table of results one sample at a time, which we can do by looking at the report attribute of the object returned by get_gating_results().

In [ ]:
gating_results = session.get_gating_results(case)
gating_results.report
Out[ ]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 OpA_I2_C2_IM1_WB1_R1.fcs (root,) scatter RectangleGate None root 127122 31.7805 31.780500 1
1 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter) singlets PolygonGate None scatter 115728 28.9320 91.036957 2
2 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 61324 15.3310 52.989769 3

Now we can visualise the gate as before.

In [ ]:
p = session.plot_gate(case, 'cd3')
show(p)

Quadrant gates¶

Because Flowkit is compliant to the Gating-ML framework, creating quadrant gates is rather cumbersome. But as this is a common tool used for gating, we give an example of how to do it below. Let's create a plot of CD4 vs CD8 to help us define the quadrants we want to draw. As usual, we define Dimension objects for each marker, ensuring we use the compensated and transformed data.

In [ ]:
cd4 = fk.Dimension(
    'FITC-A', 
    compensation_ref = 'custom_spill', 
    transformation_ref = 'biex'
)

cd8 = fk.Dimension(
    'PerCP-Cy5-5-A', 
    compensation_ref = 'custom_spill', 
    transformation_ref = 'biex'
)

p = session.plot_scatter(
    case,
    x_dim     = cd4,
    y_dim     = cd8,
    gate_name = 'cd3'
)

show(p)

We would like to split the regions of thie plot into 4 quadrants such that we separately gate the CD4s, CD8s (and double negatives and positives). We first need to define two QuadrantDividers, which are objects that define the split in each axis we'd like the quadrants to be divided at. We then just store these together in a list called quad_divs.

In [ ]:
quad_div1 = fk.QuadrantDivider(
    'cd4-div',
    'FITC-A',
    compensation_ref   = 'custom_spill',
    transformation_ref = 'biex',
    values = [1700]
)

quad_div2 = fk.QuadrantDivider(
    'cd8-div',
    'PerCP-Cy5-5-A',
    compensation_ref   = 'custom_spill',
    transformation_ref = 'biex',
    values = [2000]
)

quad_divs = [quad_div1, quad_div2]

Next, we define the specific quadrants that will result from splitting up the space. We give each population a name using the quadrant_id argument, a list of the names of our divider_refs, and a list indicating the divider_ranges (where None can be taken to mean +/- infinity). For example, quad_1 below defines a population called 'cd4/8_double_pos in the quadrant with a CD4 intensity > 1700 and a CD8 intensity > 2000. In contrast, quad_2 defines a population called cd4 in the quadrant with a CD4 intensity > 1700 and a CD8 intensity <= 2000. We then just store these together in a list called quadrants.

In [ ]:
quad_1 = fk.gates.Quadrant(
    quadrant_id    = 'cd4/8_double_pos',
    divider_refs   = ['cd4-div', 'cd8-div'],
    divider_ranges = [(1700, None), (2000, None)]
)

quad_2 = fk.gates.Quadrant(
    quadrant_id    = 'cd4',
    divider_refs   = ['cd4-div', 'cd8-div'],
    divider_ranges = [(1700, None), (None, 2000)]
)

quad_3 = fk.gates.Quadrant(
    quadrant_id    = 'cd8',
    divider_refs   = ['cd4-div', 'cd8-div'],
    divider_ranges = [(None, 1700), (2000, None)]
)

quad_4 = fk.gates.Quadrant(
    quadrant_id    = 'cd4/8_double_neg',
    divider_refs   = ['cd4-div', 'cd8-div'],
    divider_ranges = [(None, 1700), (None, 2000)]
)

quadrants = [quad_1, quad_2, quad_3, quad_4]

After all that work we can finally define the QuadrantGate object and add it to our Session object, proving the lists of Dividers and Quadrants we just defined.

In [ ]:
quad_gate1 = fk.gates.QuadrantGate(
    'cd4_cd8_quad',
    dividers  = quad_divs,
    quadrants = quadrants
)

session.add_gate(quad_gate1, gate_path = ('root', 'scatter', 'singlets', 'cd3'))

print(session.get_gate_hierarchy())

session.analyze_samples()
gating_results = session.get_gating_results(case)
gating_results.report
root
╰── scatter
    ╰── singlets
        ╰── cd3
            ╰── cd4_cd8_quad
                ├── cd4/8_double_pos
                ├── cd4
                ├── cd8
                ╰── cd4/8_double_neg
Out[ ]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level
0 OpA_I2_C2_IM1_WB1_R1.fcs (root,) scatter RectangleGate None root 127122 31.78050 31.780500 1
1 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter) singlets PolygonGate None scatter 115728 28.93200 91.036957 2
2 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 61324 15.33100 52.989769 3
4 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 39580 9.89500 64.542430 4
6 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets, cd3) cd4/8_double_neg QuadrantGate cd4_cd8_quad cd3 741 0.18525 1.208336 4
3 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets, cd3) cd4/8_double_pos QuadrantGate cd4_cd8_quad cd3 384 0.09600 0.626182 4
5 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 20619 5.15475 33.623051 4

We can plot our quadrants as before, but note that we must interact with the named QuadrantGate, not each quadrant individually.

In [ ]:
p = session.plot_gate(
    case, 
    gate_name = 'cd4_cd8_quad'
)

show(p)

Extracting population statistics¶

Now that we’ve finished gating our data, we probably want to extract some summary statistics per sample, per gate. The block below may look a little intimidating just because I haven't found an easier way of doing this, but let's go through it step by step.

First, we retrieve the compensation matrix and make a list of the gates we want statistics for.

Second, we create placeholder lists, one the same length as the number of gates and the other the same length as the number of samples.

Third, we iterate over each gate, within each sample, extracting a DataFrame of the gate events, adding columns to indicate the sample and gate, and appending it to the frame list. Once we've iterated over each gate in a sample we concatenate them into a single table for that sample that gets appended onto the frames list. Once the process has been repeated for each sample, we have a list of DataFrames that is once again concatenated into a single DataFrame.

Finally, the DataFrame we end up with has two rows of columns, which can be convenient for display but will cause us problems later when we join this table with another one. So we remove the second set of column labels using droplevel(1, axis = 1) (where the 1 indicates the second row and axis = 1 indicates we want to drop a column label, not a row label).

In [ ]:
comp  = session.get_comp_matrices()[0]
gates = ['scatter', 'singlets', 'cd3', 'cd4', 'cd8']

frame  = [0] * len(gates)
frames = [0] * len(session.get_sample_ids())

for name_ind, name in enumerate(session.get_sample_ids()):
    for gate_ind, gate in enumerate(gates):
        frame[gate_ind] = session.get_gate_events(
            name,
            gate_name = gate,
            matrix    = comp,
            transform = biex_xform
        )

        frame[gate_ind]['sample'] = name
        frame[gate_ind]['gate_name'] = gate

    frames[name_ind] = pd.concat(frame)

frames = pd.concat(frames)
frames = frames.droplevel(1, axis = 1)
frames
Out[ ]:
pnn FSC-A FSC-H SSC-A FITC-A PE-A PerCP-Cy5-5-A PE-Cy7-A APC-A APC-Cy7-A Time sample gate_name
1 116219.039062 94956.0 60836.460938 2693.074882 1568.251127 388.405031 3182.470663 1576.743492 3262.084927 0.002000 OpA_I2_C2_IM1_WB1_R1.fcs scatter
4 85385.437500 69691.0 46193.339844 2522.430369 2010.843012 743.455975 3365.240966 1779.159772 3146.113409 0.005000 OpA_I2_C2_IM1_WB1_R1.fcs scatter
5 93750.718750 76884.0 36037.441406 647.642239 843.651185 292.877092 1031.396914 807.299111 2569.393741 0.006000 OpA_I2_C2_IM1_WB1_R1.fcs scatter
6 69603.523438 57286.0 28271.621094 2665.484255 2890.146957 510.446184 3462.074423 1523.258805 3094.958703 0.008000 OpA_I2_C2_IM1_WB1_R1.fcs scatter
7 69679.679688 59490.0 25382.960938 2450.164382 2544.513800 1167.203579 3379.043537 2024.134048 3104.151284 0.008000 OpA_I2_C2_IM1_WB1_R1.fcs scatter
... ... ... ... ... ... ... ... ... ... ... ... ...
399778 98247.523438 80400.0 46125.441406 754.832138 1631.039471 3185.086271 3339.554956 2004.359906 2953.744799 163.190002 OpA_I2_C2_IM1_WB3_R3.fcs cd8
399840 85462.718750 69455.0 43126.203125 1139.117260 1714.930117 3096.326423 3433.256091 1532.504700 2881.940483 163.210007 OpA_I2_C2_IM1_WB3_R3.fcs cd8
399857 88887.679688 73463.0 45411.519531 720.210632 1466.005584 3019.473351 3533.625819 2349.002653 2862.171702 163.216003 OpA_I2_C2_IM1_WB3_R3.fcs cd8
399953 72181.757812 57811.0 45337.800781 932.333396 1913.562101 3142.703219 3571.499703 1532.719127 2954.807660 163.255005 OpA_I2_C2_IM1_WB3_R3.fcs cd8
399983 79176.164062 63691.0 41063.980469 408.620567 2177.116483 3048.663364 3436.104832 0.000000 2971.021394 163.265991 OpA_I2_C2_IM1_WB3_R3.fcs cd8

3473947 rows × 12 columns

Phew! That was a lot of work! That gives us a nice DataFrame of single cell data of the gates of interest, where every event is annotated by its sample of orgin and population. What we'd like to do is calculate the median intensity for each marker separately for each population and sample. We can do this by first grouping the DataFrame by sample and gate_name, then using the agg() method to calculate the median of each marker.

In [ ]:
medians = frames.groupby(['sample', 'gate_name']).agg(
    {'FITC-A'        : 'median',
     'PE-A'          : 'median',
     'PerCP-Cy5-5-A' : 'median',
     'PE-Cy7-A'      : 'median',
     'APC-A'         : 'median',
     'APC-Cy7-A'     : 'median'}
)

medians
Out[ ]:
pnn FITC-A PE-A PerCP-Cy5-5-A PE-Cy7-A APC-A APC-Cy7-A
sample gate_name
OpA_I2_C2_IM1_WB1_R1.fcs cd3 2448.656418 2556.682626 1189.141019 3494.773710 1623.562180 3138.078764
cd4 2504.224808 2640.803679 897.839425 3549.533868 1376.106169 3120.934437
cd8 928.206770 2167.162883 3135.428674 3415.849628 2340.662568 3176.701838
scatter 1044.081758 2083.788766 1045.632647 3290.920241 2368.278801 3103.907538
singlets 1037.251193 2157.618375 1077.148351 3312.846526 2436.006097 3105.897302
OpA_I2_C2_IM1_WB1_R2.fcs cd3 2432.733272 2534.581248 1176.837506 3481.712588 1621.440822 3121.853209
cd4 2488.699773 2625.121875 889.202780 3536.892817 1349.841359 3105.410951
cd8 940.987379 2092.636582 3109.487259 3401.402924 2356.932904 3158.606084
scatter 1047.680557 2039.360009 1022.652481 3261.204134 2403.305023 3085.378309
singlets 1040.099008 2116.236021 1056.676042 3285.735868 2477.167214 3087.681993
OpA_I2_C2_IM1_WB1_R3.fcs cd3 2439.976137 2534.515412 1184.755494 3486.000428 1602.343445 3122.160008
cd4 2496.429268 2626.753967 901.244845 3541.241251 1343.525952 3105.819597
cd8 932.733258 2122.145915 3123.510714 3406.154205 2308.175072 3159.893206
scatter 1043.411092 2054.967578 1029.501463 3265.015720 2387.901735 3086.562506
singlets 1036.006345 2133.128594 1062.286350 3291.753223 2454.058692 3088.411071
OpA_I2_C2_IM1_WB2_R1.fcs cd3 2446.816191 2010.260980 959.769319 3571.557686 1158.809997 3171.344827
cd4 2494.134986 2152.444117 688.367792 3598.911094 915.013242 3157.265908
cd8 927.506400 1667.676912 3103.854743 3522.644201 1755.949658 3212.207612
scatter 1216.943779 1608.284567 950.229950 3477.090929 1685.449479 3133.937228
singlets 1230.260450 1645.794738 979.548801 3490.060648 1734.659169 3134.715931
OpA_I2_C2_IM1_WB2_R2.fcs cd3 2443.791408 2006.128292 966.909870 3570.432531 1089.668715 3168.107434
cd4 2492.706541 2146.880026 689.011519 3597.886694 797.814300 3154.348270
cd8 926.134100 1673.836071 3108.752529 3522.098159 1738.492336 3206.085622
scatter 1208.053982 1607.822970 956.748600 3475.217260 1651.215458 3129.957869
singlets 1215.924986 1640.725396 984.305327 3488.475905 1702.610529 3131.053175
OpA_I2_C2_IM1_WB2_R3.fcs cd3 2444.967229 2003.301148 968.547454 3570.279317 1068.120685 3163.345876
cd4 2492.989829 2141.858292 695.833584 3596.225356 795.862377 3149.560576
cd8 931.038728 1670.993442 3106.944464 3521.126293 1720.975158 3202.239944
scatter 1222.102893 1599.360908 961.231073 3476.549015 1630.831888 3125.688064
singlets 1232.560075 1630.424770 989.636735 3488.491998 1689.012435 3126.179116
OpA_I2_C2_IM1_WB3_R1.fcs cd3 2289.397332 2170.490828 1074.431139 3579.790835 1230.680056 3159.628587
cd4 2350.065987 2225.420510 755.465517 3607.874985 804.935709 3155.022823
cd8 931.196247 2104.577307 3139.040108 3530.864005 2357.675594 3178.013547
scatter 1072.792507 2015.653490 820.781538 3398.765742 2608.899327 3110.566650
singlets 1059.101832 2081.829667 819.942678 3401.894328 2631.303365 3108.560180
OpA_I2_C2_IM1_WB3_R2.fcs cd3 2295.457451 2169.441544 1091.625568 3583.036350 1207.304119 3150.577366
cd4 2357.150206 2227.422388 772.460385 3612.616478 784.224166 3145.866571
cd8 932.391321 2089.846029 3150.548929 3533.503489 2338.635397 3167.712477
scatter 1077.838810 2002.551671 841.845392 3410.039901 2585.106139 3098.337671
singlets 1061.298415 2075.553306 837.341563 3408.659173 2609.467939 3095.200655
OpA_I2_C2_IM1_WB3_R3.fcs cd3 2315.877559 2039.122338 1130.076376 3570.438694 869.444754 3001.444447
cd4 2376.316260 2163.037784 814.797382 3589.654872 587.174382 2993.992196
cd8 910.938606 1808.606282 3160.328457 3537.160325 1746.120827 3020.115676
scatter 1070.221959 2018.911086 934.225842 3412.363145 2401.378002 2887.457100
singlets 1047.282742 2106.394827 919.460220 3397.143329 2430.302715 2878.502707

Now, let's join this DataFrame with the analysis report so we get one big DatFrame that has all of the proportions and medians for each sample and population. We extract the table of proportions using the get_analysis_report() method, then use the merge() method to join it to our table of medians, matching the rows together based on their common values of sample and gate_name.

In [ ]:
props = session.get_analysis_report()
summ = props.merge(medians, on = ['sample', 'gate_name'])
summ
Out[ ]:
sample gate_path gate_name gate_type quadrant_parent parent count absolute_percent relative_percent level FITC-A PE-A PerCP-Cy5-5-A PE-Cy7-A APC-A APC-Cy7-A
0 OpA_I2_C2_IM1_WB1_R1.fcs (root,) scatter RectangleGate None root 127122 31.78050 31.780500 1 1044.081758 2083.788766 1045.632647 3290.920241 2368.278801 3103.907538
1 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter) singlets PolygonGate None scatter 115728 28.93200 91.036957 2 1037.251193 2157.618375 1077.148351 3312.846526 2436.006097 3105.897302
2 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 61324 15.33100 52.989769 3 2448.656418 2556.682626 1189.141019 3494.773710 1623.562180 3138.078764
3 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 39580 9.89500 64.542430 4 2504.224808 2640.803679 897.839425 3549.533868 1376.106169 3120.934437
4 OpA_I2_C2_IM1_WB1_R1.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 20619 5.15475 33.623051 4 928.206770 2167.162883 3135.428674 3415.849628 2340.662568 3176.701838
5 OpA_I2_C2_IM1_WB1_R2.fcs (root,) scatter RectangleGate None root 132175 33.04375 33.043750 1 1047.680557 2039.360009 1022.652481 3261.204134 2403.305023 3085.378309
6 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter) singlets PolygonGate None scatter 120130 30.03250 90.887082 2 1040.099008 2116.236021 1056.676042 3285.735868 2477.167214 3087.681993
7 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 62621 15.65525 52.127695 3 2432.733272 2534.581248 1176.837506 3481.712588 1621.440822 3121.853209
8 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 40585 10.14625 64.810527 4 2488.699773 2625.121875 889.202780 3536.892817 1349.841359 3105.410951
9 OpA_I2_C2_IM1_WB1_R2.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 20830 5.20750 33.263602 4 940.987379 2092.636582 3109.487259 3401.402924 2356.932904 3158.606084
10 OpA_I2_C2_IM1_WB1_R3.fcs (root,) scatter RectangleGate None root 133872 33.46800 33.468000 1 1043.411092 2054.967578 1029.501463 3265.015720 2387.901735 3086.562506
11 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter) singlets PolygonGate None scatter 120928 30.23200 90.331063 2 1036.006345 2133.128594 1062.286350 3291.753223 2454.058692 3088.411071
12 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 62844 15.71100 51.968113 3 2439.976137 2534.515412 1184.755494 3486.000428 1602.343445 3122.160008
13 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 40783 10.19575 64.895615 4 2496.429268 2626.753967 901.244845 3541.241251 1343.525952 3105.819597
14 OpA_I2_C2_IM1_WB1_R3.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 20829 5.20725 33.143976 4 932.733258 2122.145915 3123.510714 3406.154205 2308.175072 3159.893206
15 OpA_I2_C2_IM1_WB2_R1.fcs (root,) scatter RectangleGate None root 116529 29.13225 29.132250 1 1216.943779 1608.284567 950.229950 3477.090929 1685.449479 3133.937228
16 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter) singlets PolygonGate None scatter 103375 25.84375 88.711823 2 1230.260450 1645.794738 979.548801 3490.060648 1734.659169 3134.715931
17 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 58972 14.74300 57.046675 3 2446.816191 2010.260980 959.769319 3571.557686 1158.809997 3171.344827
18 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 40440 10.11000 68.574917 4 2494.134986 2152.444117 688.367792 3598.911094 915.013242 3157.265908
19 OpA_I2_C2_IM1_WB2_R1.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 15217 3.80425 25.803771 4 927.506400 1667.676912 3103.854743 3522.644201 1755.949658 3212.207612
20 OpA_I2_C2_IM1_WB2_R2.fcs (root,) scatter RectangleGate None root 118876 29.71900 29.719000 1 1208.053982 1607.822970 956.748600 3475.217260 1651.215458 3129.957869
21 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter) singlets PolygonGate None scatter 106191 26.54775 89.329217 2 1215.924986 1640.725396 984.305327 3488.475905 1702.610529 3131.053175
22 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 60170 15.04250 56.662052 3 2443.791408 2006.128292 966.909870 3570.432531 1089.668715 3168.107434
23 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 41060 10.26500 68.239987 4 2492.706541 2146.880026 689.011519 3597.886694 797.814300 3154.348270
24 OpA_I2_C2_IM1_WB2_R2.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 15708 3.92700 26.106033 4 926.134100 1673.836071 3108.752529 3522.098159 1738.492336 3206.085622
25 OpA_I2_C2_IM1_WB2_R3.fcs (root,) scatter RectangleGate None root 119713 29.92825 29.928250 1 1222.102893 1599.360908 961.231073 3476.549015 1630.831888 3125.688064
26 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter) singlets PolygonGate None scatter 105490 26.37250 88.119085 2 1232.560075 1630.424770 989.636735 3488.491998 1689.012435 3126.179116
27 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 60300 15.07500 57.161816 3 2444.967229 2003.301148 968.547454 3570.279317 1068.120685 3163.345876
28 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 41136 10.28400 68.218905 4 2492.989829 2141.858292 695.833584 3596.225356 795.862377 3149.560576
29 OpA_I2_C2_IM1_WB2_R3.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 15700 3.92500 26.036484 4 931.038728 1670.993442 3106.944464 3521.126293 1720.975158 3202.239944
30 OpA_I2_C2_IM1_WB3_R1.fcs (root,) scatter RectangleGate None root 176388 44.09700 44.097000 1 1072.792507 2015.653490 820.781538 3398.765742 2608.899327 3110.566650
31 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter) singlets PolygonGate None scatter 158237 39.55925 89.709617 2 1059.101832 2081.829667 819.942678 3401.894328 2631.303365 3108.560180
32 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 73853 18.46325 46.672396 3 2289.397332 2170.490828 1074.431139 3579.790835 1230.680056 3159.628587
33 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 48169 12.04225 65.222807 4 2350.065987 2225.420510 755.465517 3607.874985 804.935709 3155.022823
34 OpA_I2_C2_IM1_WB3_R1.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 22991 5.74775 31.130760 4 931.196247 2104.577307 3139.040108 3530.864005 2357.675594 3178.013547
35 OpA_I2_C2_IM1_WB3_R2.fcs (root,) scatter RectangleGate None root 173472 43.36800 43.368000 1 1077.838810 2002.551671 841.845392 3410.039901 2585.106139 3098.337671
36 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter) singlets PolygonGate None scatter 153580 38.39500 88.533020 2 1061.298415 2075.553306 837.341563 3408.659173 2609.467939 3095.200655
37 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 70976 17.74400 46.214351 3 2295.457451 2169.441544 1091.625568 3583.036350 1207.304119 3150.577366
38 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 46113 11.52825 64.969849 4 2357.150206 2227.422388 772.460385 3612.616478 784.224166 3145.866571
39 OpA_I2_C2_IM1_WB3_R2.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 22265 5.56625 31.369759 4 932.391321 2089.846029 3150.548929 3533.503489 2338.635397 3167.712477
40 OpA_I2_C2_IM1_WB3_R3.fcs (root,) scatter RectangleGate None root 170989 42.74725 42.747250 1 1070.221959 2018.911086 934.225842 3412.363145 2401.378002 2887.457100
41 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter) singlets PolygonGate None scatter 140998 35.24950 82.460275 2 1047.282742 2106.394827 919.460220 3397.143329 2430.302715 2878.502707
42 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter, singlets) cd3 EllipsoidGate None singlets 39169 9.79225 27.779827 3 2315.877559 2039.122338 1130.076376 3570.438694 869.444754 3001.444447
43 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter, singlets, cd3) cd4 QuadrantGate cd4_cd8_quad cd3 25046 6.26150 63.943425 4 2376.316260 2163.037784 814.797382 3589.654872 587.174382 2993.992196
44 OpA_I2_C2_IM1_WB3_R3.fcs (root, scatter, singlets, cd3) cd8 QuadrantGate cd4_cd8_quad cd3 12854 3.21350 32.816768 4 910.938606 1808.606282 3160.328457 3537.160325 1746.120827 3020.115676

Let's save this as a .csv file so we can use it for downstream analyses later.

In [ ]:
summ.to_csv('data/summary_statistics.csv')

Exporting gated populations as .fcs files¶

Let's say we want to write out a new .fcs file for every sample, that includes only the gated CD8 population (a more realistic scenario would be exporting .fcs files after removing debris, doublets, and dead cells ready for clustering). We can do this by extracting a DataFrame per sample of just the events of interest, converting this to a Sample object, and using the export() method (you may need to create the data/exported directory first).

In [ ]:
for id in session.get_sample_ids():
    samp = session.get_gate_events(id, 'cd8')
    samp = fk.Sample(samp, id)
    samp.export(
        id, 
        source = 'raw', 
        include_metadata = True, 
        directory = 'data/exported'
    )